Weaving sloWNet using window-based co-occurence features

نویسندگان

  • Darja Fišer
  • Maciej Piasecki
  • Bartosz Broda
چکیده

This paper presents the first results of using statistical methods and linguistically annotated corpus data to extract lists of semantically similar words that are then incorporated into an existing wordnet for Slovene. The approach was originally developed for Polish but is attractive for other languages as well because, apart from a large corpus, it requires minimal NLP tools and resources, and can therefore be easily applied to any language that is still lacking an extensive wordnet or a similar semantic lexicon. Another important advantage of the adopted approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network. Spletanje sloWNeta na podlagi informacij o sopojavljanju besed v korpusu V prispevku predstavljamo prve rezultate raziskave, v kateri smo z uporabo statističnih metod in jezikoslovno označenih korpusnih podatkov izluščili sezname semantično podobnih besed, ki smo jih nato vključili v wordnet za slovenščino. Pristop je bil prvotno razvit za poljščino, vendar je privlačen tudi za druge jezike, saj zanj razen obsežnega korpusa potrebujemo minimalna jezikovnotehnološka orodja in vire, zato ga je enostavno uporabiti tudi za jezike, za katere obsežen wordnet ali podoben semantični leksikon še ne obstaja. Druga pomembna prednost uporabljenega pristopa pa je, da temelji na izpričani jezikovni rabi, pridobljeni iz korpusa, ki se nato kaže v jezikovno utemeljeni organizaciji besedišča v izdelani semantični mreži. Glede na to, da so vsi naši dosedanji pristopi za izdelovo slovenskega wordneta celotno strukturo prevzeli iz Princeton WordNeta, ki je bil izdelan za angleščino, bodo spodbudni rezultati, dobljeni s pričujočo metodo, koristno dopolnjevali obstoječo semantično mrežo.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Patterns of Activity and Detecting Abnormal Events with Low-level Co-occurrences

We explore in this chapter a location-based approach for behavior modeling and abnormality detection. In contrast to conventional objectbased approaches for which objects are identified, classified, and tracked to locate objects with suspicious behavior, we proceed directly with event characterization and behavior modeling using low-level features. Our approach consists of two-phases. In the fi...

متن کامل

Defect Classification in Fabric Web Material using LabVIEW

Textile manufacturing is a major industry in India. It is based on the conversion of three types of fibre into yarn which in turn is woven into fabrics. Fabrics are textile materials which are made through weaving ,knitting, braiding and bonding of fibres. Weaving is described as inter-lacing of two distinct set of threads to form cloth, rug or other type of woven textile. The lengthways thread...

متن کامل

Commentary on “Co-Occurrence of Pituitary Adenoma With Suprasellar and Olfactory Groove Meningiomas”

Recently, Basic and Clinical Neuroscience published an article by Lim et al. (2016)  entitled Co-occurence of Pituitary Adenoma with Suprasellar and Olfactory Groove Meningiomas. They claimed it as the first case of co-occurence of these two malignancies. However, to our knowledge, this is not the first case reported in this regard. We reported the same case scenario in a 61-year-old woman...

متن کامل

On 3-Harness Weaving: Cataloging Designs Generated by Fundamental Blocks Having Distinct Rows and Columns

A weaving drawdown is a rectangular grid of black and white squares with at least one black and one white square in each row and column. A pattern results from vertical and horizontal translations of the defining grid. Any such grid defines a tiling pattern. However, from a weaving point of view, some of these grids define actual fabrics while others correspond to collections of threads that fa...

متن کامل

Iris Texture Recognition Using Co-occurence Matrix Features with

Iris Recognition is a rapidly expanding method of biometric authentication that is well suited to be applied to any access control system requiring high level of security. In this paper k-means algorithm is employed to optimize the database enrollment, this is carried out by choosing the best image (among many) for the same person to be a template in the database. Iris images are mapped into te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012